90 research outputs found

    Benchmarking unsupervised near-duplicate image detection

    Get PDF
    Unsupervised near-duplicate detection has many practical applications ranging from social media analysis and web-scale retrieval, to digital image forensics. It entails running a threshold-limited query on a set of descriptors extracted from the images, with the goal of identifying all possible near-duplicates, while limiting the false positives due to visually similar images. Since the rate of false alarms grows with the dataset size, a very high specificity is thus required, up to 1-10^-9 for realistic use cases; this important requirement, however, is often overlooked in literature. In recent years, descriptors based on deep convolutional neural networks have matched or surpassed traditional feature extraction methods in content-based image retrieval tasks. To the best of our knowledge, ours is the first attempt to establish the performance range of deep learning-based descriptors for unsupervised near-duplicate detection on a range of datasets, encompassing a broad spectrum of near-duplicate definitions. We leverage both established and new benchmarks, such as the Mir-Flick Near-Duplicate (MFND) dataset, in which a known ground truth is provided for all possible pairs over a general, large scale image collection. To compare the specificity of different descriptors, we reduce the problem of unsupervised detection to that of binary classification of near-duplicate vs. not-near-duplicate images. The latter can be conveniently characterized using Receiver Operating Curve (ROC). Our findings in general favor the choice of fine-tuning deep convolutional networks, as opposed to using off-the-shelf features, but differences at high specificity settings depend on the dataset and are often small. The best performance was observed on the MFND benchmark, achieving 96% sensitivity at a false positive rate of 1.43x10^-6

    On the Use of Causal Models to Build Better Datasets

    Get PDF
    In recent years, Machine Learning and Deep Learning communities have devoted many efforts to studying ever better models and more efficient training strategies. Nonetheless, the fundamental role played by dataset bias in the final behaviour of the trained models calls for strong and principled methods to collect, structure and curate datasets prior to training. In this paper, we provide an overview of the use of causal models to achieve a deeper understanding of the underlying structure beneath datasets and mitigate biases, supported by several real-life use cases from the medical and industrial domains

    CLASSIFICATION OF TAGGED MATERIAL IN A SET OF TOMOGRAPHIC IMAGES OF COLORECTAL REGION

    Get PDF
    method of classification of image portions corresponding to faecal residues from a tomographic image of a colorectal region, which comprises a plurality of voxels (2) each having a predetermined intensity value and which shows at least one portion of colon (6a, 6b, 6c, 6d) comprising at least one area of tagged material (10). The area of tagged material (10) comprises at least one area of faecal residue (10a) and at least one area of tissue affected by tagging (10b). The image further comprises at least one area of air (8) which comprises an area of pure air (8a) not influenced by the faecal residues. The method comprises the operations of identifying (100), on the basis of a predetermined identification criterion based on the intensity values, above-threshold connected regions comprising connected voxels (2) and identifying, within the above-threshold connected regions, a plurality of connected regions of tagged material comprising voxels (2) representing the area of tagged material (10). The method further comprises the operation of classifying (104) each plurality of connected regions of tagged material on the basis of specific classification comparison criteria for each connected region, in such a way as to identify voxels (20) corresponding to the area of faecal residue (10a) and voxels (2) corresponding to the area of tissue affected by tagging (10b)

    SoccER: Computer graphics meets sports analytics for soccer event recognition

    Get PDF
    Automatic event detection from images or wearable sensors is a fundamental step towards the development of advanced sport analytics and broadcasting software. However, the collection and annotation of large scale sport datasets is hindered by technical obstacles, cost of data acquisition and annotation, and commercial interests. In this paper, we present the Soccer Event Recognition (SoccER) data generator, which builds upon an existing, high quality open source game engine to enable synthetic data generation. The software generates detailed spatio-temporal data from simulated soccer games, along with fine-grained, automatically generated event ground truth. The SoccER software suite includes also a complete event detection system entirely developed and tested on a synthetic dataset including 500 minutes of game, and more than 1 million events. We close the paper by discussing avenues for future research in sports event recognition enabled by the use of synthetic data

    Method of classification of tagged material in a set of tomographic images of colorectal region

    Get PDF
    A method of classification of image portions corresponding to fecal residues from a tomographic image of a colorectal region, which comprises a plurality of voxels (2) each having a predetermined intensity value and which shows at least one portion of colon (6 a, 6 b, 6 c, 6 d) comprising at least one area of tagged material (10). The area of tagged material (10) comprises at least one area of fecal residue (10 a) and at least one area of tissue affected by tagging (10 b). The image further comprises at least one area of air (8) which comprises an area of pure air (8 a) not influenced by the fecal residues. The method comprises the operations of identifying (100), on the basis of a predetermined identification criterion based on the intensity values, above-threshold connected regions comprising connected voxels (2) and identifying, within the above-threshold connected regions, a plurality of connected regions of tagged material comprising voxels (2) representing the area of tagged material (10). The method further comprises the operation of classifying (104) each plurality of connected regions of tagged material on the basis of specific classification comparison criteria for each connected region, in such a way as to identify voxels (20) corresponding to the area of fecal residue (10 a) and voxels (2) corresponding to the area of tissue affected by tagging (10 b)

    Breast mass detection with faster R-CNN: On the feasibility of learning from noisy annotations

    Get PDF
    In this work we study the impact of noise on the training of object detection networks for the medical domain, and how it can be mitigated by improving the training procedure. Annotating large medical datasets for training data-hungry deep learning models is expensive and time consuming. Leveraging information that is already collected in clinical practice, in the form of text reports, bookmarks or lesion measurements would substantially reduce this cost. Obtaining precise lesion bounding boxes through automatic mining procedures, however, is difficult. We provide here a quantitative evaluation of the effect of bounding box coordinate noise on the performance of Faster R-CNN object detection networks for breast mass detection. Varying degrees of noise are simulated by randomly modifying the bounding boxes: in our experiments, bounding boxes could be enlarged up to six times the original size. The noise is injected in the CBIS-DDSM collection, a well curated public mammography dataset for which accurate lesion location is available. We show how, due to an imperfect matching between the ground truth and the network bounding box proposals, the noise is propagated during training and reduces the ability of the network to correctly classify lesions from background. When using the standard Intersection over Union criterion, the area under the FROC curve decreases by up to 9%. A novel matching criterion is proposed to improve tolerance to noise

    Immersive Virtual Reality-Based Interfaces for Character Animation

    Get PDF
    Virtual Reality (VR) has increasingly attracted the attention of the computer animation community in search of more intuitive and effective alternatives to the current sophisticated user interfaces. Previous works in the literature already demonstrated the higher affordances offered by VR interaction, as well as the enhanced spatial understanding that arises thanks to the strong sense of immersion guaranteed by virtual environments. These factors have the potential to improve the animators' job, which is tremendously skill-intensive and time-consuming. The present paper explores the opportunities provided by VR-based interfaces for the generation of 3D animations via armature deformation. To the best of the authors' knowledge, for the first time a tool is presented which allows users to manage a complete pipeline supporting the above animation method, by letting them execute key tasks such as rigging, skinning and posing within a well-known animation suite using a customizable interface. Moreover, it is the first work to validate, in both objective and subjective terms, character animation performance in the above tasks and under realistic work conditions involving different user categories. In our experiments, task completion time was reduced by 26%, on average, while maintaining almost the same levels of accuracy and precision for both novice and experienced users

    PROTOtypical Logic Tensor Networks (PROTO-LTN) for Zero Shot Learning

    Get PDF
    Semantic image interpretation can vastly benefit from approaches that combine sub-symbolic distributed representation learning with the capability to reason at a higher level of abstraction. Logic Tensor Networks (LTNs) are a class of neuro-symbolic systems based on a differentiable, first-order logic grounded into a deep neural network. LTNs replace the classical concept of training set with a knowledge base of fuzzy logical axioms. By defining a set of differentiable operators to approximate the role of connectives, predicates, functions and quantifiers, a loss function is automatically specified so that LTNs can learn to satisfy the knowledge base. We focus here on the subsumption or isOfClass predicate, which is fundamental to encode most semantic image interpretation tasks. Unlike conventional LTNs, which rely on a separate predicate for each class (e.g., dog, cat), each with its own set of learnable weights, we propose a common isOfClass predicate, whose level of truth is a function of the distance between an object embedding and the corresponding class prototype. The PROTOtypical Logic Tensor Networks (PROTO-LTN) extend the current formulation by grounding abstract concepts as parametrized class prototypes in a high-dimensional embedding space, while reducing the number of parameters required to ground the knowledge base. We show how this architecture can be effectively trained in the few and zero-shot learning scenarios. Experiments on Generalized Zero Shot Learning benchmarks validate the proposed implementation as a competitive alternative to traditional embedding-based approaches. The proposed formulation opens up new opportunities in zero shot learning settings, as the LTN formalism allows to integrate background knowledge in the form of logical axioms to compensate for the lack of labelled examples

    Bridging the gap between Natural and Medical Images through Deep Colorization

    Get PDF
    Deep learning has thrived by training on large-scale datasets. However, in many applications, as for medical image diagnosis, getting massive amount of data is still prohibitive due to privacy, lack of acquisition homogeneity and annotation cost. In this scenario, transfer learning from natural image collections is a standard practice that attempts to tackle shape, texture and color discrepancies all at once through pretrained model fine-tuning. In this work, we propose to disentangle those challenges and design a dedicated network module that focuses on color adaptation. We combine learning from scratch of the color module with transfer learning of different classification backbones, obtaining an end-to-end, easy-to-train architecture for diagnostic image recognition on X-ray images. Extensive experiments showed how our approach is particularly efficient in case of data scarcity and provides a new path for further transferring the learned color information across multiple medical datasets

    Faster-LTN: a neuro-symbolic, end-to-end object detection architecture

    Get PDF
    The detection of semantic relationships between objects represented in an image is one of the fundamental challenges in image interpretation. Neural-Symbolic techniques, such as Logic Tensor Networks (LTNs), allow the combination of semantic knowledge representation and reasoning with the ability to efficiently learn from examples typical of neural networks. We here propose Faster-LTN, an object detector composed of a convolutional backbone and an LTN. To the best of our knowledge, this is the first attempt to combine both frameworks in an end-to-end training setting. This architecture is trained by optimizing a grounded theory which combines labelled examples with prior knowledge, in the form of logical axioms. Experimental comparisons show competitive performance with respect to the traditional Faster R-CNN architecture.Comment: accepted for presentation at ICANN 202
    • …
    corecore